-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pd.concat produces frames with inconsistent order when concating the ones with categorical indices #46019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the example from the original OP as well and put a test in pandas/tests/reshape/test_concat.py
doc/source/whatsnew/v1.5.0.rst
Outdated
@@ -267,6 +267,7 @@ Categorical | |||
^^^^^^^^^^^ | |||
- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`) | |||
- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`) | |||
- Bug in :meth:`CategoricalIndex._concat` when concatenating categorical indexes that have the same categories returning indexes in improper order (:issue:`44099`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you refer to pd.concat
as that's the most user visible bug here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thank you for review!
I added description in this commit(aacc75c)
Thanks! I added the test case in this commit(a66a60f) |
["a", "b", "c", "b", "a", "c"], categories=["a", "b", "c"] | ||
), | ||
) | ||
tm.assert_frame_equal(concat([df1, df2]), expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
result = concat([df1, df2])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I changed at this commit 91d2155
expected = CategoricalIndex( | ||
["a", "b", "c", "b", "a", "c"], categories=["a", "b", "c"] | ||
) | ||
tm.assert_index_equal(ci1.append(ci2), expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use result=ci1.append(ci2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also changed at this commit 91d2155
doc/source/whatsnew/v1.5.0.rst
Outdated
@@ -267,6 +267,7 @@ Categorical | |||
^^^^^^^^^^^ | |||
- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`) | |||
- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`) | |||
- Bug in :meth:`CategoricalIndex._concat` when concatenating categorical indexes that have the same categories returning indexes in improper order; this method can be called in :meth:`concat` (:issue:`44099`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update the note here; this should be about the user facing case in the OP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I added description line at reshaping
section that refer to concat
func in this commit 3425b85
Is this the correct fix that you pointed out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Hi! Thanks for all your reviews.
I would like to request a review from you again.
I finally change doc/source/whatsnew/v1.5.0.rst in this commit(0e62a36)
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
["a", "b", "c", "b", "a", "c"], categories=["a", "b", "c"] | ||
), | ||
) | ||
result = concat([df1, df2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make an asv for this type of concatanation and then show how it performs previously. i am worried that the list conversion of the codes is very expensive.
Thanks for the pull request, but it appears to be stale. If interested in continuing, please merge in the main branch, address the review, and we can reopen. |
pd.concat
produces frames with inconsistent order when concating the ones with categorical indices #44099doc/source/whatsnew/v1.5.0.rst
file if fixing a bug or adding a new feature.